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LONG-TERM  GOALS 

An  over- arching  goal  in  prediction  science  is  to  objectively  improve  numerical  models  of  nature. 
Meeting  that  goal  requires  objective  quantification  of  deficiencies  in  our  models.  The  structural 
differences  between  a  numerical  model  and  a  true  system  are  difficult  to  ascertain  in  the  presence  of 
multiple  sources  of  error.  Numerical  weather  prediction  (NWP)  is  subject  to  temporally  and  spatially 
varying  error,  resulting  from  both  imperfect  atmospheric  models  and  the  chaotic  growth  of  initial- 
condition  (IC)  error.  The  aim  of  our  work  is  to  provide  a  method  that  begins  to  systematically 
disentangle  the  model  inadequacy  signal  from  the  initial  condition  error  signal. 

OBJECTIVES 

We  are  engaging  a  comprehensive  effort  that  uses  state-of-the-science  estimation  methods  in  data 
assimilation  (DA)  and  statistical  modeling,  including:  (1)  the  characterization  of  existing  model-to- 
model  differences  via  novel  spatial  Analysis  of  Variance  (ANOVA)  methods;  (2)  the  development  of  a 
flexible  representation  for  the  various  spatial  and  temporal  scales  of  model  error;  (3)  the  estimation  of 
parameters  in  representing  those  scales  using  a  probabilistic  approach  to  DA,  namely  the  Ensemble 
Kalman  Filter;  and  (4)  the  determination  of  whether  incorporation  of  estimated  error  structure  in 
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improves  short-term  forecasts,  again  using  spatial  ANOVA  methods,  this  time  within  a  formal  testing 
framework.  Research  focus  is  on  near-surface  winds  over  both  the  ocean  and  land.  The  method 
under  development  are  sufficiently  general  and  can  apply  to  a  wide  range  of  battlespace  environments. 

APPROACH 

The  technical  approach  includes  numerical  weather  prediction  and  state  estimation  efforts  at  NPS,  and 
statistical  modeling  efforts  at  University  of  California  Berkeley  (UCB)  under  sub-contract.  At  NPS  PI 
Hacker  is  implementing  the  Navy’s  Operational  Global  Atmospheric  Prediction  (NOGAPS),  and  two 
limited-area  mesoscale  models:  the  Navy’s  Coupled  Ocean-Atmosphere  Mesoscale  Prediction  System 
(COAMPS)  model,  and  the  open-source  Weather  Research  and  Forecast  (WRF)  model,  within  a  state- 
of-the-science  ensemble  Data  Assimilation  Research  Testbed  (DART).  The  NOGAPS-DART 
provides  global  ensemble  prediction  capability  that  can  be  consistently  applied  to  the  COAMPS  and 
WRF  as  lateral  boundary  conditions.  Scientific  objectives  will  be  me  by  systematically  choosing  the 
WRF  or  CO  AMPS  as  the  “truth,”  which  can  provide  observations  for  assimilating  into  the  other 
model.  Under  this  approach,  spatio-temporal  distributions  of  uncertainty  (error  in  this  context)  are 
available  for  analysis  with  special  attention  to  second-order  moments.  Eventually,  we  will  use  the 
same  framework  to  objectively  estimate  parameters  in  statistical  models,  of  NWP  model  error, 
developed  at  UCB.  Hypotheses  will  be  formed  and  formally  tested.  This  work  is  benefitting  from 
collaboration  Co-I  James  Hansen,  Justin  McLay,  and  others  NRL  staff.  A  budgeted  post-doc  has  not 
been  hired  due  to  delays  in  funding  distributions,  and  partial  increments. 

UCB  PI  Cari  Kaufman  is  working  to  advance  the  statistical  methods  needed  to  provide  an  objective 
space-time  characterization  of  the  error  distributions.  We  approach  the  characterization  of  the 
uncertainty  via  fitting  a  hierarchical  Bayesian  model  that  captures  the  important  features  and 
variability  in  the  data.  The  implied  distribution  from  the  model  will  be  a  valid  stochastic  spatial 
process  under  probability  theory.  Ideally,  fitting  the  statistical  model  to  different  datasets  should  allow 
us  to  capture  the  significant  differences  between  the  different  underlying  data  generating  distributions. 
Moreover,  a  realistic  statistical  model  can  also  simulate  realistic  wind  fields  quickly  which  can  be 
beneficial  for  studying  other  processes  that  require  surface  winds  as  an  input.  Graduate  student  Wayne 
Lee  (unfunded)  is  contributing  substantially  to  this  work.  Postdoc  Benjamin  Shaby  began  work  in 
September  2011. 

WORK  COMPLETED 

Work  in  FY201 1  was  toward  various  components  of  tasks  1  and  2.  At  NPS  we  focused  on  the 
technical  implementations  of  the  tools  required  to  complete  the  research.  This  included 
implementation  of  NOGAPS,  COAMPS,  and  WRF  with  DART.  NOGAPS-DART  and  WRF-DART 
is  completed.  COAMPS-DART  has  been  implemented  at  NRL  but  not  yet  at  NPS.  NOGAPS-DART 
was  tested  for  the  period  of  Oct  2009  and  results  compared  against  the  operational  global  data 
assimilation  system  used  at  FNMOC  to  ensure  quality.  In  the  WRF-DART  implementation,  simple 
model  error  terms  have  been  added  within  the  WRF  code  in  preparation  for  estimating  the  coefficients 
of  the  model-error  process  estimated  as  described  below. 

At  UCB  focus  was  on  addressing  challenges  associated  with  applying  heirarchical  Bayesian  techniques 
to  large,  multivariate,  and  non-stationary  datasets  typical  of  NWP.  Progress  is  documented  in  the 
following  results  section. 
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RESULTS 


Primary  results  are  the  development  of  a  viable  statistical  model  for  spatial  variability  of  near-surface 
surface  winds.  This  represents  an  important  step  toward  a  model  capable  of  characterizing  the 
comlexity  of  model  errors  in  those  winds.  Here  we  describe  how  we  successfully  addressed  three 
primary  challenges  associated  with  this  model.  Results  represent  a  significant  step  toward  objectively 
characterizing  the  time-space  structures  of  model  inadequacy. 

Results  are  focused  on  the  surface-wind  forecasts  from  the  WRF.  A  dataset  was  prepared  at  NPS  from 
existing  ensemble  prediction  runs  (no  ensemble-filter  data  assimilation)  to  minimize  delays  on  UCB 
progress.  In  this  case,  the  WRF  lateral  boundary  conditions  were  provided  by  the  Global  Forecast 
System  (GFS)  operational  model  from  the  National  Centers  for  Environmental  Prediction.  Although 
we  anticipate  the  primary  research  will  be  completed  over  the  Korean  Peninsula  and  surrounding  seas, 
this  example  data  set  is  over  CONUS.  An  example  wind  field  is  shown  in  Fig.  1. 


Figure  1:  Example  surface  wind  field.  Data  are  thinned  to  20%  of  the  original  resolution. 

We  approach  the  characterization  of  the  uncertainty  via  fitting  a  hierarchical  Bayesian  model  that 
captures  the  important  features  and  variability  in  the  data.  The  implied  distribution  from  the  model  will 
be  a  valid  stochastic  spatial  process  under  probability  theory.  Ideally,  fitting  the  statistical  model  to 
different  datasets  should  allow  us  to  capture  the  significant  differences  between  the  different 
underlying  data  generating  distributions.  Moreover,  a  realistic  statistical  model  can  also  simulate 
realistic  wind  fields  quickly  which  can  be  beneficial  for  studying  other  processes  that  require  surface 
winds  as  an  input. 

We  face  three  statistical  challenges  with  surface  wind  fields:  (1)  with  roughly  12,000  spatial  locations 
over  North  America  over  two  months  observed  every  2  days  (32  forecasts),  it  is  a  large  data  set  and 
poses  a  computational  problem  for  classical  geostatistical  methods,  (2)  winds  are  represented  as  a  sum 
of  3  components  U ,  V ,  and  W,  introducing  a  multivariate  spatial  process  problem  (currently  we  ignore 
vertical  velocity  W  in  our  study  since  its  magnitude  and  variability  is  negligible);  (3)  surface  wind 
fields  exhibit  great  nonstationarity  and  some  discontinuity  depending  on  the  underlying  topography. 
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Classical  stationary  methods  tend  to  model  the  first-order  trends  well,  but  fail  to  capture  the  second 
order  uncertainty  which  is  the  primary  quantity  of  interest. 

We  first  tackle  the  multivariate  issue  by  leveraging  the  geostrophic  wind  relationship  linking  the 
pressure  gradient  to  U  and  V under  frictionless  environments. 


p  OX 


pdy 


where /is  the  Coriolis  parameter,  p  is  air  density,  x  is  the  longitude,  y  is  latitude,  and p  is  pressure. 
This  basic  relationship  is  a  linear  relationship  between  pressure  gradient  and  wind  speeds  that  can 
simplify  the  joint  model  between  U  and  V,  so  that  it  can  be  explained  by  a  single  stationary  pressure 
field.  After  some  exploration,  we  decided  on  the  form: 


ax  ay 

=  Pvfi{s)+P  {s)^-{s)+p  ^-(s)+ev(s) 
ax  ay 


where  coefficients  /?  are  functions  of  location  s,  and  fare  residuals.  Royle  et  al.  (1999)  modeled  ocean 
wind  fields  as  above  with  spatially  constant  coefficents  /.  However,  the  interpretation  of  [i  based  on 
the  geostrophic  relationship  suggests  that  /  should  vary  spatially  by  latitude  and  the  local  ratio  of  moist 
vs.  dry  air  instead  of  treated  as  a  fixed  global  quantity.  The  varying  coefficient  model  was  advocated 
by  Gelfand  et  al.  (2003)  to  efficiently  model  first  order  nonstationary  features  for  the  /.  To  quickly 
and  naively  evaluate  this  model,  we  run  a  linear  regression  using  the  32  days  for  each  location  ignoring 
all  spatial  dependency  and  examine  the  estimated  /  and  R-squared  to  evaluate  the  fit  (R-squared  =  1  is 
a  perfect  fit).  A  great  majority  of  the  R-squared  values  are  above  0.8  which  means  more  than  80%  of 
the  variability  in  the  data  can  be  explained  by  the  simple  geostrophic  relationship  without  borrowing 
any  information  from  neighboring  locations.  Figure  2  shows  estimates  of  the  first-order  coefficients.  It 
is  clear  that  the  underlying  topography,  which  is  not  smooth,  dominates  the  estimates.  We  can  expect 
this  result,  and  more  generally  that  the  departures  from  geostrophy  are  strongly  modulated  by  lower¬ 
boundary  forcing  on  the  atmosphere. 


4 


Figure  2:  Estimates  for  first-order  constants  (left)  pu,o  and  (right) 

After  the  first-order  nonstationarity  is  handled,  we  look  at  the  second-order  nonstationarity  in  the 
distribution  of  the  [j  fields.  To  handle  it,  we  propose  to  decompose  each  /?,•  term  above  into  a  linear 
combination  of  three  independent  stationary  spatial  processes: 

PM) = P,o  0) + [/W(.v)]L/y.v)J+ [Elev(s)]lft2  (.v)j, 

where  Land(s)  is  an  indicator  and  Elev(s)  is  the  elevation  for  location  Using  matrix  notation  where 
Land  a  =  Land(s)  and  Elevu  =  Elev(s), 

Cov^fs)]  =  £a#  +  [Land]  [Land] T  +  [Elev]  [Elev] ' . 
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Figure  3:  Estimates  for  second-order  coefficients  (left)  f3UtX  and  (right)  /3v>y. 
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Figure  4:  Estimates  for  second-order  cross-coefficients  (left)  (3luy  and  (right)  J3V>X, 

This  naturally  creates  a  valid  covariance  that  reflects  the  different  topography,  and  the  decomposition 
can  be  extended  to  other  surface  properties  affecting  the  winds.  We  tested  multiple  simulations  to 
ensure  that  the  different  processes  can  be  identified  with  sufficient  data,  approximating  the  posterior 
distribution  for  each  simulated  dataset  using  a  Gibbs  sampler.  To  summarize  the  above  model,  we 
present  a  schematic  of  the  Bayesian  varying  coefficient  hierarchical  model  in  Fig.4.  Here  i  indexes  the 
terms  in  the  varying  coefficient  model:  intercept  and  pressure  gradient  in  the  N-S  and  E-W  directions, 
and  j  indexes  the  three  components  used  in  constructing  the  nonstationary  model  for  the  coefficients, 
corresponding  to  an  intercept,  land,  and  elevation  components. 
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Figure  5:  Schematic  of  hierarchical  Bayesian  model.  Hyperparameters  are  a  priori  independent. 


The  remaining  challenge  is  the  computational  burden  of  the  hierarchical  model.  Lindgren  et  al.  (201 1) 
found  the  Gaussian  Markov  Random  Field  (GMRF)  precision  representation  for  a  variety  of  Matem 
covariance  functions,  a  popular  covariance  family  among  the  spatial  statisticians  for  certain  theoretical 
properties.  GMRF  methods  are  known  for  their  extraordinary  computational  efficiency  due  to  its 
sparse  precision  structure  but  suffer  from  the  inability  to  easily  model  long  range  dependencies  (Wall 
2004)  and  its  requirement  on  specific  boundary  conditions.  We  commented  on  the  published  method’s 
bias  due  to  different  boundary  conditions  and  its  implications  for  parameter  estimation  (Lindgren  et  al. 
2011).  Their  approximation  method  provides  estimates  that  consistently  underestimate  the  parameters 
that  are  consistently  estimable  under  classical  methods.  The  authors  responded  with  the  reassurance 
that  the  bias  may  be  shrunk  if  the  finite  element  mesh  density  was  increased  at  the  cost  of  higher 
computational  requirements  (Lee  and  Kaufman,  2011;  Lindgren  et  al.  2011). 

For  future  directions,  we  are  exploring  the  optimal  MCMC  algorithm  to  run  this  hierarchical  model. 
Traditional  Gibb  sampling  produces  parameters  with  high  autocorrelation  from  the  MCMC  chain.  We 
are  currently  exploring  slice  sampling  and  potentially  hybrid  MCMC.  Once  this  runs,  then  we  will  run 
a  computer  experiment  to  see  if  our  stochastic  model  can  detect  the  changes  in  the  dynamic  equations 
within  the  WRF.  In  other  words,  we  will  introduce  a  synthetic  NWP  model  inadequacy  and  see  if 
fitting  the  bayesian  model  above  before  and  after  the  model  discrepancy  is  introduced  will  produce 
detectable  and  informative  differences. 
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Upon  successfully  modeling  the  differences  between  NWP  models,  we  will  try  to  estimate  he 
descrepancy  via  the  parameter  estimation  framework  under  development  at  NPS.  In  preparation,  we 
have  experimented  with  a  simple  960-variable  model  with  dynamics  analogous  to  atmospheric  waves 
to  determine  the  potential  to  estimate  parameters  with  weak  correlations  to  the  model  state. 

Estimation  of  satellite  radiance  bias  parameters  is  a  particular  problem  displaying  weak  correlations 
with  the  model  state.  It  is  typically  performed  sub-optimally  in  variational  estimation  systems; 
ensemble  estimation  systems  offer  a  more  optimal  solution  but  have  only  been  implemented  assuming 
no  correlation  between  state  and  bias.  This  is  a  limiting  assumption,  and  we  proved  that  the  estimation 
procedure  itself  produces  correlations  which  should  not  be  ignored.  Further,  we  demonstrated  that 
ensemble  filters  are  cabable  of  estimating  the  bias  parameters  without  invoking  that  assumption. 

IMPACT/APPLICATIONS 

The  bulk  of  DoN  day-to-day  operations  rely  on  accurate  predictions  of  winds,  seas,  ceiling,  and 
visibility.  The  focus  of  the  proposed  work  is  to  identify  inadequacies  associated  with  the  modeled 
atmospheric  boundary  layer.  Any  discoveries  that  enable  the  improvement  of  boundary  layer 
modeling  will  ultimately  have  a  positive  impact  on  Navy  warfighters. 

The  proposed  methods  have  the  potential  to  enable  essential  improvement  in  modeling  capability. 
Instead  of  tuning  models  based  on  intuition,  we  are  forming  a  foundation  for  objective  identification  of 
model  errors.  Those  errors  could  immediately  be  accounted  for  in  probablistic  forecast  systems,  and 
also  be  subject  to  physical  interpretation  by  subject  experts. 

RELATED  PROJECTS 

The  MATERHORN  project  (http://www.nd.edu/~dynamics/materhom/index.html ),  funded  by  ONR, 
seeks  to  improve  atmospheric  predictability  over  complex  terrain.  It  is  similarly  focused  on 
predictions  in  the  atmospheric  boundary  layer.  Rather  than  a  focus  on  model  inadequacy, 
MATERHORN  focuses  on  field  programs  aimed  at  improving  models  via  direct  comparison  to 
observations,  and  quantifying  optimal  observing  strategies  for  improving  predictions.  PI  Hacker  is 
using  some  of  the  technical  developments  here  to  aid  that  effort,  and  vice  versa. 
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